feat(observability): verification lifecycle metrics, correlation IDs, and audit logs#58
feat(observability): verification lifecycle metrics, correlation IDs, and audit logs#58
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Co-authored-by: chrismaz11 <24700273+chrismaz11@users.noreply.github.com>
…IDs, and audit logs Co-authored-by: chrismaz11 <24700273+chrismaz11@users.noreply.github.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c056bb3c16
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| riskScore: signedReceipt.riskScore | ||
| }); | ||
|
|
||
| receiptsIssuedTotal.inc({ decision: signedReceipt.decision, policy_profile: input.policy.profile }); |
There was a problem hiding this comment.
Bound policy profile labels before incrementing metrics
In the /api/v1/verify handler, deedshield_receipts_issued_total is incremented with policy_profile: input.policy.profile, which comes directly from request input and is only validated as a non-empty string. Any caller with verify scope can therefore generate unbounded (or very long) label values, causing high-cardinality time series growth and increased Prometheus memory/scrape cost that can degrade observability for the service. Restrict this label to a bounded allowlist (or remove it) before calling .inc.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds observability for the verification lifecycle in the API, plus supporting monitoring assets (Prometheus rules + Grafana dashboard) and tests for correlation IDs/metrics exposure.
Changes:
- Add new Prometheus business metrics (counters + histogram) for receipt issuance, verification outcomes, revocations, and end-to-end verify latency.
- Propagate Fastify
request.idviax-request-idand emit structured audit log events for key lifecycle actions. - Extend alert rules, Grafana dashboard panels, and monitoring docs; add a Vitest suite for correlation IDs and metrics exposure.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
apps/api/src/server.ts |
Adds lifecycle metrics, x-request-id propagation, and structured audit logs in verify/verify-receipt/revoke flows. |
apps/api/src/observability.test.ts |
New tests validating x-request-id presence/uniqueness and metrics endpoint output. |
docs/ops/monitoring/alert-rules.yml |
Adds recording rules + alerts for lifecycle latency and revocation spikes. |
docs/ops/monitoring/grafana-dashboard-deedshield-api.json |
Adds dashboard panels for issuance rate, verify p95, revocations, and verification success ratio. |
docs/ops/monitoring/README.md |
Documents new metrics, correlation ID behavior, and cardinality guidance. |
package-lock.json |
Version bump from 0.1.0 → 0.2.0 across workspaces. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| riskScore: signedReceipt.riskScore | ||
| }); | ||
|
|
||
| receiptsIssuedTotal.inc({ decision: signedReceipt.decision, policy_profile: input.policy.profile }); |
| "uid": "$datasource" | ||
| }, | ||
| "editorMode": "code", | ||
| "expr": "histogram_quantile(0.95, sum by (le) (rate(deedshield_verify_duration_seconds_bucket{job=~\"$job\"}[5m])))", |
| const requestId = response.headers['x-request-id'] as string; | ||
| // Must be a string without obvious secret patterns (no Bearer, no sha256= etc.) | ||
| expect(requestId).not.toMatch(/^(bearer|sha256=|eyJ)/i); |
Adds production-grade observability to the TrustSignal verification lifecycle — request counts, latency histograms, outcome counters, correlation ID propagation, structured audit events, updated alert rules, and Grafana panels.
Metrics (
apps/api/src/server.ts)Four new business-level counters/histograms added to the Prometheus registry alongside existing HTTP infrastructure metrics:
deedshield_receipts_issued_totaldecision,policy_profiledeedshield_receipt_verifications_totaloutcome(verified|not_verified)deedshield_revocations_totaldeedshield_verify_duration_secondsdecisionCorrelation IDs
onSendhook propagates Fastify's auto-generatedrequest.idasx-request-idon every response. Structured log entries forreceipt_issued,receipt_verified, andreceipt_revokedevents carryrequest_idfor cross-log correlation.Audit log events (no secrets, no PII)
{ "event": "receipt_issued", "request_id": "req-42", "receipt_id": "...", "decision": "ALLOW", "policy_profile": "STANDARD_IL", "duration_ms": 312 } { "event": "receipt_verified", "request_id": "req-43", "receipt_id": "...", "outcome": "verified", "signature_verified": true, "integrity_verified": true } { "event": "receipt_revoked", "request_id": "req-44", "receipt_id": "...", "issuer_id": "issuer-a" }Alerts & Dashboard
alert-rules.yml: 5 new recording rules + 2 new alerts:DeedShieldVerifyP95LatencyWarning(p95 >3s / 10m) andDeedShieldRevocationSpike(>5 revocations in 5m)docs/ops/monitoring/README.md: Documents new metrics, label cardinality guidance, and correlation ID behaviorAI Disclosure (optional)
Review Checklist
Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
checkpoint.prisma.io/opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node /home/REDACTED/work/TrustSignal/TrustSignal/node_modules/prisma/build/child {"product":"prisma","version":"5.22.0","cli_install_type":"local","information":"","local_timestamp":"2026-03-18T14:36:21Z","project_hash":"bf470088","cli_path":"/home/REDACTED/work/TrustSignal/TrustSignal/node_modules/prisma/build/index.js","cli_path_hash"(dns block)/opt/hostedtoolcache/node/24.14.0/x64/bin/node /opt/hostedtoolcache/node/24.14.0/x64/bin/node /home/REDACTED/work/TrustSignal/TrustSignal/node_modules/prisma/build/child {"product":"prisma","version":"5.22.0","cli_install_type":"local","information":"","local_timestamp":"2026-03-18T14:36:21Z","project_hash":"7d4fa8bb","cli_path":"/home/REDACTED/work/TrustSignal/TrustSignal/node_modules/.bin/prisma","cli_path_hash":"dfd4e6f9(dns block)If you need me to access, download, or install something from one of these locations, you can either:
💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.